Goto

Collaborating Authors

 hong kong polytechnic university


A Novel Attention-Augmented Wavelet YOLO System for Real-time Brain Vessel Segmentation on Transcranial Color-coded Doppler

Zhang, Wenxuan, Li, Shuai, Wang, Xinyi, Sun, Yu, Kang, Hongyu, Wan, Pui Yuk Chryste, Qin, Jing, Zhang, Yuanpeng, Zheng, Yong-Ping, Lam, Sai-Kit

arXiv.org Artificial Intelligence

The Circle of Willis (CoW), vital for ensuring consistent blood flow to the brain, is closely linked to ischemic stroke. Accurate assessment of the CoW is important for identifying individuals at risk and guiding appropriate clinical management. Among existing imaging methods, Transcranial Color-coded Doppler (TCCD) offers unique advantages due to its radiation-free nature, affordability, and accessibility. However, reliable TCCD assessments depend heavily on operator expertise for identifying anatomical landmarks and performing accurate angle correction, which limits its widespread adoption. To address this challenge, we propose an AI-powered, real-time CoW auto-segmentation system capable of efficiently capturing cerebral arteries. No prior studies have explored AI-driven cerebrovascular segmentation using TCCD. In this work, we introduce a novel Attention-Augmented Wavelet YOLO (AAW-YOLO) network tailored for TCCD data, designed to provide real-time guidance for brain vessel segmentation in the CoW. We prospectively collected TCCD data comprising 738 annotated frames and 3,419 labeled artery instances to establish a high-quality dataset for model training and evaluation. The proposed AAW-YOLO demonstrated strong performance in segmenting both ipsilateral and contralateral CoW vessels, achieving an average Dice score of 0.901, IoU of 0.823, precision of 0.882, recall of 0.926, and mAP of 0.953, with a per-frame inference speed of 14.199 ms. This system offers a practical solution to reduce reliance on operator experience in TCCD-based cerebrovascular screening, with potential applications in routine clinical workflows and resource-constrained settings. Future research will explore bilateral modeling and larger-scale validation.


Ada-FCN: Adaptive Frequency-Coupled Network for fMRI-Based Brain Disorder Classification

Xun, Yue, Xu, Jiaxing, Gao, Wenbo, Yang, Chen, Wang, Shujun

arXiv.org Artificial Intelligence

Resting-state fMRI has become a valuable tool for classifying brain disorders and constructing brain functional connectivity networks by tracking BOLD signals across brain regions. However, existing mod els largely neglect the multi-frequency nature of neuronal oscillations, treating BOLD signals as monolithic time series. This overlooks the cru cial fact that neurological disorders often manifest as disruptions within specific frequency bands, limiting diagnostic sensitivity and specificity. While some methods have attempted to incorporate frequency informa tion, they often rely on predefined frequency bands, which may not be optimal for capturing individual variability or disease-specific alterations. To address this, we propose a novel framework featuring Adaptive Cas cade Decomposition to learn task-relevant frequency sub-bands for each brain region and Frequency-Coupled Connectivity Learning to capture both intra- and nuanced cross-band interactions in a unified functional network. This unified network informs a novel message-passing mecha nism within our Unified-GCN, generating refined node representations for diagnostic prediction. Experimental results on the ADNI and ABIDE datasets demonstrate superior performance over existing methods. The code is available at https://github.com/XXYY20221234/Ada-FCN.


GEMeX-RMCoT: An Enhanced Med-VQA Dataset for Region-Aware Multimodal Chain-of-Thought Reasoning

Liu, Bo, Zhao, Xiangyu, He, Along, Chen, Yidi, Fu, Huazhu, Wu, Xiao-Ming

arXiv.org Artificial Intelligence

Medical visual question answering aims to support clinical decision-making by enabling models to answer natural language questions based on medical images. While recent advances in multi-modal learning have significantly improved performance, current methods still suffer from limited answer reliability and poor interpretability, impairing the ability of clinicians and patients to understand and trust model outputs. To address these limitations, this work first proposes a Region-Aware Multimodal Chain-of-Thought (RMCoT) dataset, in which the process of producing an answer is preceded by a sequence of intermediate reasoning steps that explicitly ground relevant visual regions of the medical image, thereby providing fine-grained explainability. Furthermore, we introduce a novel verifiable reward mechanism for reinforcement learning to guide post-training, improving the alignment between the model's reasoning process and its final answer. Remarkably, our method achieves comparable performance using only one-eighth of the training data, demonstrating the efficiency and effectiveness of the proposal. The dataset is available at https://www.med-vqa.com/GEMeX/.


Certifiably Optimal Doppler Positioning using Opportunistic LEO Satellites

Song, Baoshan, Wen, Weisong, Zhang, Qi, Xu, Bing, Hsu, Li-Ta

arXiv.org Artificial Intelligence

To provide backup and augmentation to global navigation satellite system (GNSS), Doppler shift from Low Earth Orbit (LEO) satellites can be employed as signals of opportunity (SOP) for position, navigation and timing (PNT). Since the Doppler positioning problem is non-convex, local searching methods may produce two types of estimates: a global optimum without notice or a local optimum given an inexact initial estimate. As exact initialization is unavailable in some unknown environments, a guaranteed global optimization method in no need of initialization becomes necessary. To achieve this goal, we propose a certifiably optimal LEO Doppler positioning method by utilizing convex optimization. In this paper, the certifiable positioning method is implemented through a graduated weight approximation (GWA) algorithm and semidefinite programming (SDP) relaxation. To guarantee the optimality, we derive the necessary conditions for optimality in ideal noiseless cases and sufficient noise bounds conditions in noisy cases. Simulation and real tests are conducted to evaluate the effectiveness and robustness of the proposed method. Specially, the real test using Iridium-NEXT satellites shows that the proposed method estimates an certifiably optimal solution with an 3D positioning error of 140 m without initial estimates while Gauss-Newton and Dog-Leg are trapped in local optima when the initial point is equal or larger than 1000 km away from the ground truth. Moreover, the certifiable estimation can also be used as initialization in local searching methods to lower down the 3D positioning error to 130 m.


ChatMyopia: An AI Agent for Pre-consultation Education in Primary Eye Care Settings

Wu, Yue, Chen, Xiaolan, Zhang, Weiyi, Liu, Shunming, Sum, Wing Man Rita, Wu, Xinyuan, Shang, Xianwen, Kee, Chea-su, He, Mingguang, Shi, Danli

arXiv.org Artificial Intelligence

Funding The study was supported by the Start - up Fund for RAPs under the Strategic Hiring Scheme (P0048623) from HKSAR, the Global STEM Professorship Scheme (P0046113) and Henry G. Leong Endowed Professorship in Elderly Vision Health. 2 Abstract Large language models (LLMs) show promise for tailored healthcare communication but face challenges in interpretability and multi - task integration particularly for domain - specific needs like myopia, a nd their real - world effectiveness as patient education tools has yet to be demonstrated . Here, we introduce ChatMyopia, an LLM - based AI agent designed to address text and image - based inquiries related to myopia. To achieve this, ChatMyopia integrates an image classification tool and a retrieval - augmented knowledge base built from literature, expert consensus, and clinical guidelines. M yopic maculopathy grading task, single question examination and human evaluations validated its ability to deliver personalized, accurate, and safe responses to myopia - related inquirie s with high scalability and interpretability . In a randomized controlled trial (n=70, NCT06607822), ChatMyopia significantly improved patient satisfaction compared to traditional leaflets, enhancing patient education in accuracy, empathy, disease awareness, and patient - eye care practitioner communication. These findings highlight ChatMyopia ' s potential as a valuable supplement to enhance patient education and improve satisfaction with medical services in primary eye care settings . Keywords: Large language model, Medical a gent, Myopia, Patient education, Randomized controlled trial. Introduction For patients, a lack of basic understanding of their condition before initial consultations can hinder communication, as clinicians may spend time explaining fundamental concepts instead of critical issues, resulting in poor decisions and noncompliance [1, 2] . Therefore, patients require professional information and support to enhance their healthcare experiences.


A computer vision-based model for occupancy detection using low-resolution thermal images

Cui, Xue, Zakka, Vincent Gbouna, Lee, Minhyun

arXiv.org Artificial Intelligence

Occupancy plays an essential role in influencing the energy consumption and operation of heating, ventilation, and air conditioning (HVAC) systems. Traditional HVAC typically operate on fixed schedules without considering occupancy. Advanced occupant-centric control (OCC) adopted occupancy status in regulating HVAC operations. RGB images combined with computer vision (CV) techniques are widely used for occupancy detection, however, the detailed facial and body features they capture raise significant privacy concerns. Low-resolution thermal images offer a non-invasive solution that mitigates privacy issues. The study developed an occupancy detection model utilizing low-resolution thermal images and CV techniques, where transfer learning was applied to fine-tune the You Only Look Once version 5 (YOLOv5) model. The developed model ultimately achieved satisfactory performance, with precision, recall, mAP50, and mAP50 values approaching 1.000. The contributions of this model lie not only in mitigating privacy concerns but also in reducing computing resource demands.


AI-powered virtual eye: perspective, challenges and opportunities

Wu, Yue, Guo, Yibo, Yan, Yulong, Yang, Jiancheng, Zhou, Xin, Cheng, Ching-Yu, Shi, Danli, He, Mingguang

arXiv.org Artificial Intelligence

We envision the "virtual eye" as a next-generation, AI-powered platform that uses interconnected foundation models to simulate the eye's intricate structure and biological function across all scales. Advances in AI, imaging, and multiomics provide a fertile ground for constructing a universal, high-fidelity digital replica of the human eye. This perspective traces the evolution from early mechanistic and rule-based models to contemporary AI-driven approaches, integrating in a unified model with multimodal, multiscale, dynamic predictive capabilities and embedded feedback mechanisms. We propose a development roadmap emphasizing the roles of large-scale multimodal datasets, generative AI, foundation models, agent-based architectures, and interactive interfaces. Despite challenges in interpretability, ethics, data processing and evaluation, the virtual eye holds the potential to revolutionize personalized ophthalmic care and accelerate research into ocular health and disease.


Tutorial Proposal: Speculative Decoding for Efficient LLM Inference

Xia, Heming, Du, Cunxiao, Li, Yongqi, Liu, Qian, Li, Wenjie

arXiv.org Artificial Intelligence

This tutorial presents a comprehensive introduction to Speculative Decoding (SD), an advanced technique for LLM inference acceleration that has garnered significant research interest in recent years. SD is introduced as an innovative decoding paradigm to mitigate the high inference latency stemming from autoregressive decoding in LLMs. At each decoding step, SD efficiently drafts several future tokens and then verifies them in parallel. This approach, unlike traditional autoregressive decoding, facilitates the simultaneous decoding of multiple tokens per step, thereby achieving promising 2x-4x speedups in LLM inference while maintaining original distributions. This tutorial delves into the latest techniques in SD, including draft model architectures and verification strategies. Additionally, it explores the acceleration potential and future research directions in this promising field. We aim for this tutorial to elucidate the current research landscape and offer insights for researchers interested in Speculative Decoding, ultimately contributing to more efficient LLM inference.


FFA Sora, video generation as fundus fluorescein angiography simulator

Wu, Xinyuan, Wang, Lili, Chen, Ruoyu, Liu, Bowen, Zhang, Weiyi, Yang, Xi, Feng, Yifan, He, Mingguang, Shi, Danli

arXiv.org Artificial Intelligence

Fundus fluorescein angiography (FFA) is critical for diagnosing retinal vascular diseases, but beginners often struggle with image interpretation. This study develops FFA Sora, a text-to-video model that converts FFA reports into dynamic videos via a Wavelet-Flow Variational Autoencoder (WF-VAE) and a diffusion transformer (DiT). Trained on an anonymized dataset, FFA Sora accurately simulates disease features from the input text, as confirmed by objective metrics: Frechet Video Distance (FVD) = 329.78, Learned Perceptual Image Patch Similarity (LPIPS) = 0.48, and Visual-question-answering Score (VQAScore) = 0.61. Specific evaluations showed acceptable alignment between the generated videos and textual prompts, with BERTScore of 0.35. Additionally, the model demonstrated strong privacy-preserving performance in retrieval evaluations, achieving an average Recall@K of 0.073. Human assessments indicated satisfactory visual quality, with an average score of 1.570(scale: 1 = best, 5 = worst). This model addresses privacy concerns associated with sharing large-scale FFA data and enhances medical education.


pyrtklib: An open-source package for tightly coupled deep learning and GNSS integration for positioning in urban canyons

Hu, Runzhi, Xu, Penghui, Zhong, Yihan, Wen, Weisong

arXiv.org Artificial Intelligence

Artificial intelligence (AI) is revolutionizing numerous fields, with increasing applications in Global Navigation Satellite Systems (GNSS) positioning algorithms in intelligent transportation systems (ITS) via deep learning. However, a significant technological disparity exists as traditional GNSS algorithms are often developed in Fortran or C, contrasting with the Python-based implementation prevalent in deep learning tools. To address this discrepancy, this paper introduces pyrtklib, a Python binding for the widely utilized open-source GNSS tool, RTKLIB. This binding makes all RTKLIB functionalities accessible in Python, facilitating seamless integration. Moreover, we present a deep learning subsystem under pyrtklib, which is a novel deep learning framework that leverages pyrtklib to accurately predict weights and biases within the GNSS positioning process. The use of pyrtklib enables developers to easily and quickly prototype and implement deep learning-aided GNSS algorithms, showcasing its potential to enhance positioning accuracy significantly.